Probabilistic boosting tree framework for learning discriminative models

ABSTRACT

A probabilistic boosting tree framework for computing two-class and multi-class discriminative models is disclosed. In the learning stage, the probabilistic boosting tree (PBT) automatically constructs a tree in which each node combines a number of weak classifiers (e.g., evidence, knowledge) into a strong classifier or conditional posterior probability. The PBT approaches the target posterior distribution by data augmentation (e.g., tree expansion) through a divide-and-conquer strategy. In the testing stage, the conditional probability is computed at each tree node based on the learned classifier which guides the probability propagation in its sub-trees. The top node of the tree therefore outputs the overall posterior probability by integrating the probabilities gathered from its sub-trees. In the training stage, a tree is recursively constructed in which each tree node is a strong classifier. The input training set is divided into two new sets, left and right ones, according to the learned classifier. Each set is then used to train the left and right sub-trees recursively.

CROSS REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional Application Ser.No. 60/660,136, filed on Mar. 9, 2005, which is incorporated byreference in its entirety.

FIELD OF THE INVENTION

The present invention is directed to a probabilistic boosting treeframework for learning discriminative models, and more particularly, toa probabilistic boosting tree framework for computing two class andmulti-class discriminative models.

BACKGROUND OF THE INVENTION

The task of classifying/recognizing, detecting and clustering generalobjects in natural scenes is extremely challenging. The difficulty isdue to many reasons: large intra-class variation and inter-classsimilarity, articulation and motion, different lighting conditions,orientations/viewing directions, and the complex configurations ofdifferent objects. FIG. 1 shows a multitude of different images. Thefirst row 102 of FIG. 1 displays some face images. The rest of the rows104-110 show some typical images from the Caltech 101 categories ofobjects. Some of the objects are highly non-rigid and some of theobjects in the same category bear little similarity to each other. Forthe categorization task, very high level knowledge is required to putdifferent instances of a class into the same category.

The problem of general scene understanding can be viewed in two aspects:modeling and computing. Modeling addresses the problem of how tolearn/define the statistics of general patterns/objects. Computingtackles the inference problem. Let x be an image sample and itsinterpretation be y. Ideally, the generative models p(x|y) are obtainedfor a pattern to measure the statistics about any sample x.Unfortunately, not only are such generative models often out of reach,but they also create large computational burdens in the computing stage.For example, faces are considered a relatively easy class to study. Yet,there is no existing generative model which captures all the variationsfor a face such as multi-view, shadow, expression, occlusion, and hairstyle. Some sample faces can be seen in the first row 102 of FIG. 1.Alternatively, a discriminative model p(y|x) is learned directly, inwhich y is just a simple variable to say, “yes” or “no”, or a classlabel.

A known technique referred to as AdaBoost and its variants have beensuccessfully applied in many problems in vision and machine learning.AdaBoost approaches the posterior p(y|x) by selecting and combining aset of weak classifiers into a strong classifier. However, there areseveral problems with the current AdaBoost method. First, though itasymptotically converges to the target distribution, it may need to pickhundreds of weak classifiers. This poses a huge computational burden.Second, the order in which features are picked in the training stage isnot preserved. The order of a set of features may correspond tohigh-level semantics and, thus, it is very important for theunderstanding of objects/patterns. Third, the re-weighing scheme ofAdaBoost may cause samples previously correctly classified to bemisclassified again. Fourth, though extensions from two-class tomulti-class classification have been proposed, learning weak classifiersin the multi-class case using output coding is more difficult andcomputationally expensive.

Another known method typically referred to as AdaTree combines AdaBoostwith a decision tree. The main goal of the AdaTree method is to speed upthe AdaBoost method by pruning. The AdaTree method learns a strongclassifier by combining a set of weak classifiers into a tree structure,but it does not address multi-class classification.

A number of approaches exist to handle object classification anddetection. Cascade approaches that are used together with AdaBoost haveshown to be effective in rare event detection. The cascade method can beviewed as a special case of the method of the present invention. In thecascade, a threshold is picked such that all the positive samples arepushed to the right side of the tree. However, pushing positives to theright side may cause a big false positive rate, especially when thepositives and negatives are hard to separate. The method of the presentinvention naturally divides the training sets into two parts. In thecase where there are many more negative samples than positive samples,most of the negatives are passed into leaf nodes close to the top. Deeptree leaves focus on classifying the positives and negatives which arehard to separate.

Decision trees have been widely used in vision and artificialintelligence. In a traditional decision tree, each node is a weakdecision maker and thus the result at each node is more random. Incontrast, in the present invention, each tree node is a strong decisionmaker and it learns a distribution q(y|x). Other approaches include A*,generative models, EM and grammar and semantics. There is a need for aframework that is capable of learning discriminative models for use inmulti-class classification that is not computationally burdensome.

SUMMARY OF THE INVENTION

The present invention is directed to a method for localization of anobject in an image. A probabilistic boosting tree is constructed inwhich each node combines a number of weak classifiers into a strongclassifier or conditional posterior probability. At least one inputimage containing the object to be localized is received. A bounding boxin the input image is identified in which the object should reside basedon the conditional posterior probability. A probability value for thebounding box is computed based on the likelihood that the object in factresides in that location. Bounding boxes and probability values aredetermined for different locations in the input image. The bounding boxwith the highest computed probability is selected as the location wherethe object resides.

The present invention is also directed to a method for detecting anobject in an image. A probabilistic boosting tree is constructed inwhich each node combines a number of weak classifiers into a strongclassifier or conditional posterior probability. At least one inputimage is received. A bounding box in the at least one input image isidentified in which the object may reside based on the conditionalposterior probability. A probability value for the bounding box iscomputed based on the likelihood that the object resides in the image.The probability is compared to a predetermined threshold. The boundingbox is maintained if the probability is above the predeterminedthreshold. Bounding boxes and probability values are determined fordifferent locations in the input image. A determination is made as towhether the object resides in the image if the probability for at leastone bounding box is above the predetermined threshold.

The present invention is also directed to a method of classifying imagesof objects into different image categories. A probabilistic boostingtree is recursively constructed in which each tree node is a strongclassifier. A discriminative model is obtained at the top of the treeand each level of the tree comprises an augmented variable. An inputtraining set is divided into two new sets according to a learnedclassifier. The two new sets are used to train a left and rightsub-trees recursively such that clustering is automatically formed in ahierarchical way. An appropriate number of classifications are outputtedbased on a number of clusters formed.

BRIEF DESCRIPTION OF THE DRAWINGS

Preferred embodiments of the present invention will be described belowin more detail, wherein like reference numerals indicate like elements,with reference to the accompanying drawings:

FIG. 1 illustrates examples of images of natural scenes and commonobjects;

FIG. 2 is a block diagram of a system for implementing a probabilisticboosting tree in accordance with the present invention;

FIG. 3 outlines a method for training a boosting tree in accordance withthe present invention;

FIG. 4 illustrates an example how a probabilistic boosting tree islearned and training samples are divided in accordance with the presentinvention;

FIG. 5 outlines a method for testing a probabilistic boosting tree inaccordance with the present invention;

FIG. 6 illustrates an example of a probabilistic model of a tree inaccordance with the present invention;

FIG. 7 outlines a method for training a multi-class probabilisticboosting tree in accordance with the present invention;

FIG. 8 illustrates histograms of four object images in intensity andthree Gabor filtering results in accordance with the present invention;

FIG. 9 illustrates some sample images from an image set and the clusterslearned in accordance with the present invention;

FIG. 10 illustrates some sample image clusters that were formed inaccordance with the present invention;

FIG. 11 illustrates a still image from an input video of a heart and theresulting left ventricle detection in accordance with the presentinvention;

FIG. 12 shows an example of left ventricle localization in an ultrasoundimage in accordance with the present invention;

FIG. 13 shows an example of fetal head localization in an ultrasoundimage in accordance with the present invention;

FIG. 14 shows an example of fetal abdomen localization in an ultrasoundimage in accordance with the present invention;

FIG. 15 shows an example of fetal femur localization in an ultrasoundimage in accordance with the present invention;

FIG. 16 shows an example of rectal tube detection in computed tomographyimage in accordance with the present invention;

FIG. 17 shows an enlarged view of the rectal tube of FIG. 16 inaccordance with the present invention; and

FIG. 18 shows examples of face detection in accordance with the presentinvention.

DETAILED DESCRIPTION

The present invention is directed to a probabilistic boosting treeframework for computing two-class and multi-class discriminative models.In the learning stage, the probabilistic boosting tree (PBT)automatically constructs a tree in which each node combines a number ofweak classifiers (e.g., evidence, knowledge) into a strong classifier orconditional posterior probability. The PBT approaches the targetposterior distribution by data augmentation (e.g., tree expansion)through a divide-and-conquer strategy.

In the testing stage, the conditional probability is computed at eachtree node based on the learned classifier which guides the probabilitypropagation in its sub-trees. The top node of the tree therefore outputsthe overall posterior probability by integrating the probabilitiesgathered from its sub-trees. Also, clustering is naturally embedded inthe learning phase and each sub-tree represents a cluster of a certainlevel.

In the training stage, a tree is recursively constructed in which eachtree node is a strong classifier. The input training set is divided intotwo new sets, left and right ones, according to the learned classifier.Each set is then used to train the left and right sub-trees recursively.The discriminative model obtained at the top of the tree approaches thetarget posterior distribution by data augmentation. Each level of thetree is an augmented variable. Clustering is intrinsically embedded inthe learning stage with clusters automatically discovered and formed ina hierarchical way.

For the multi-class problem, the goal is to learn a discriminative modelwhile keeping the hierarchical tree structure. This is done by treatingthe multi-class publication problem as a special two-classclassification problem. At each node, either a positive or negativelabel is assigned to each class in minimizing the total entropy. Throughthis procedure, the multi-class and two-class learning procedures becomeunified. Clusters of multi-classes are again directly formed.

The general AdaBoost method and its variants learn a strong classifierby combining a set of weak classifiers${H(x)} = {\sum\limits_{t = 1}^{T}{\alpha_{t}{h_{t}(x)}}}$in which h_(t)(x)is a weak classifier. The total error rate$ɛ = {\sum\limits_{i}{w_{i}\left\lbrack {{{sign}\left\lbrack {H\left( x_{i} \right)} \right\rbrack} \neq y_{i}} \right\rbrack}}$is shown to be bounded by $\begin{matrix}{{ɛ \leq {2^{T}{\prod\limits_{t = 1}^{T}\sqrt{ɛ_{t}\left( {1 - ɛ_{t}} \right)}}}},} & (1)\end{matrix}$where w_(i) is the probability of sample x_(i).

When dealing with x_(i) arising from a complex distribution, ε_(t)quickly approaches 0.5, and the convergence becomes slow. One possibleremedy lies in designing more effective weak classifiers that are betterat separating the positives from the negatives. Unfortunately, it isoften hard to obtain good weak classifiers and the computationalcomplexity in computing these classifiers and features is yet anotherconstraint. One of the key ideas in AdaBoost is that samples incorrectlyclassified receive more weights the following time. Due to the updaterule and normalization for D_(t) previously correctly classified samplesmay be misclassified again and thus receive a penalty. Therefore, aftersome steps, weak classifiers become ineffective. Instead of putting allthe weak classifiers together into a single strong classifier, adivide-and-conquer approach is used instead.

FIG. 2 illustrates a block diagram of a general system for implementingthe probabilistic boosting tree framework in accordance with the presentinvention. One or more images are obtained using an input device 202such as a camera. The images are received by a processor 204 whichapplies the PBT framework to the image. The PBT framework can be used toaccomplish a number of tasks as will be described in greater detailhereinafter. For example, the PBT framework can be used for objectcategorization or object detection. Training samples that are stored indatabase 206 can be used to learn and compute discriminative models. Thesystem takes the input images and outputs a classification result. Inthe case of a two class problem, the output is either positive ornegative. In a multi-class problem, the class to which the image belongsis the output. The classification results are then shown on display 208.

FIG. 3 outlines a method for training a boosting tree in accordance withthe present invention. For notational simplicity, the probabilitiescomputed by each learned AdaBoost method are denoted as follows:$\begin{matrix}\begin{matrix}{{q\left( {+ 1} \middle| x \right)} = \frac{\exp\left\{ {2{H(x)}} \right\}}{1 + {\exp\left\{ {2{H(x)}} \right\}}}} & {and} & {{q\left( {- 1} \middle| x \right)} = {\frac{\exp\left\{ {{- 2}{H(x)}} \right\}}{1 + {\exp\left\{ {{- 2}{H(x)}} \right\}}}.}}\end{matrix} & (2)\end{matrix}$The algorithm is intuitive. It recursively learns a tree. At each node,a strong classifier is learned using the standard boosting algorithm.The training samples are then divided into two new sets using thelearned classifier, the left one and the right one, which are then usedto train a left sub-tree and right sub-tree respectively. The variable εis used to control., to some degree, the overfitting problem. Thosesamples falling in the range of$\left\lbrack {{\frac{1}{2} - ɛ},{\frac{1}{2} + ɛ}} \right\rbrack$are confusing ones and will be used in both the left and the rightsub-trees for training. If ${ɛ = \frac{1}{2}},$then all training samples are past into both the sub-trees with weightsre-computed based on the strong classifier. PBT then becomes similar toboosting. If ε=0, then each sample is past either into right or lefttree. Therefore, positive and negative samples are almost sure to beseparated, if there are no identical ones. But it may overfit the data.

If a training set is split into two parts, the new error rate is$\begin{matrix}{{\varepsilon_{split} = {{{\sum\limits_{i}{{w_{i}(l)}\left\lbrack {{H_{t}\left( x_{i} \right)} \neq y_{i}} \right\rbrack}} + {\sum\limits_{i}{{w_{i}(r)}\left\lbrack {{H_{r}\left( x_{i} \right)} \neq y_{i}} \right\rbrack}}} \leq \varepsilon_{i}}}{{{Where}\quad\varepsilon} = {\sum\limits_{i}{{w_{i}\left\lbrack {{H\left( x_{i} \right)} \neq y_{i}} \right\rbrack}.}}}} & (3)\end{matrix}$

It is straight forward to see that the equality holds when H_(t)=H andH_(r)=H. In general, reducing the number of input samples reduces thecomplexity of the problem leading to a better decision boundary.

Under this model, positive and negative samples are naturally dividedinto sub-groups. FIG. 4 shows an example of how a tree is learned andthe training samples are divided. Samples which are hard to classify arepassed further down leading to the expansion of the tree. Clustering ofpositives and negatives is naturally performed. One group serves as anauxiliary variable to the other group. Since each tree node is a strongclassifier, it can deal with samples with complex distribution. Also,there is no need to pre-specify the number of clusters. The hierarchicalstructure of the tree allows for reporting of the clusters according todifferent levels of discrimination.

As shown in FIG. 4 a PBT is created from a synthetic dataset 402 of 2000points. Weak classifiers are likelihood classifiers on features such asposition and distance to 2D lines. The first level of the tree 404, 406divides the whole set into two parts 408, 410. The one set 408 hasmostly dark points since they are away from the rest of the clouds. Thetree expands on the parts where positive and negative samples aretangled. The further levels 412-418 expand from set 410 to betterseparate the dark points and light points from that set.

The testing stage is consistent with the training stage. FIG. 5 providesthe details for computing the approximated posterior probability {tildeover (p)}(y|x). At the root of the tree, the information from thedescendants is gathered, and an overall approximated posteriordistribution is reported. This approach can also be turned into aclassifier which makes hard decisions. After computing q(+1|x) andq(−1|x), a decision can be made to go into the right or left sub-treesby comparing q(+1|x) and q(−1|x). The empirical distribution {circumflexover (q)}(y) contained at the leaf node of the tree is then passed backto the root node of the tree. However, the advantage of usingprobability is obvious. Once a PBT is trained, the {tilde over (p)}(y|x)can be used as a threshold to balance between precision and recall.

FIG. 6 illustrates an example of a probabilistic model of a tree inaccordance with the present invention. Each tree node 602-612 is astrong classifier. The dark nodes 620-632 represent leaf nodes. Acomplex pattern x is generated by a generation process which has a setof hidden variables. PBT can be viewed as having a similar aspect bydoing implicit data augmentation. The goal of the learning method is tolearn the posterior distribution p(y|x). Each tree level l_(i) is anaugmented variable. $\begin{matrix}\begin{matrix}{{\overset{\sim}{p}\left( y \middle| x \right)} = {\sum\limits_{l_{1}}{{\overset{\sim}{p}\left( {\left. y \middle| l_{1} \right.,x} \right)}{q\left( l_{1} \middle| x \right)}}}} \\{= {\sum\limits_{l_{1},l_{2}}{{\overset{\sim}{p}\left( {\left. y \middle| l_{2} \right.,l_{1},x} \right)}\quad{q\left( {\left. l_{2} \middle| l_{1} \right.,x} \right)}\quad{q\left( l_{1} \middle| x \right)}}}} \\{{= {\sum\limits_{l_{1},\quad\ldots\quad,l_{n}}{\overset{\sim}{p}\left( {\left. y \middle| l_{n} \right.,\ldots\quad,l_{1},x} \right)}}},\ldots\quad,{{q\left( {\left. l_{2} \middle| l_{1} \right.,x} \right)}\quad{q\left( l_{1} \middle| x \right)}}}\end{matrix} & (4)\end{matrix}$At a tree node, if the exact model can be learned, then $\begin{matrix}{{{\overset{\sim}{p}\left( {\left. y \middle| l_{i} \right.,{\ldots\quad l_{1}},x} \right)} = {\sum\limits_{l_{i + 1}}{{\delta\left( {y = l_{i + 1}} \right)}\quad{q\left( {\left. l_{i + 1} \middle| l_{i} \right.,{\ldots\quad l_{1}},x} \right)}}}},} & (5)\end{matrix}$which means the model q(l_(i+1)|l_(i), . . . l₁, x) perfectly predictsthe y and, thus, the tree stops expanding. The augmented variables l_(i)gradually decouples y from x to make a better prediction.

A two class boosting tree approach has been described. Traditionalboosting approaches for multi-class classification require multi-classweak classifiers, which are in general much more computationallyexpensive to learn and compute than two class weak classifiers. This isespecially a problem when the number of classes becomes large.Interestingly, different classes of patterns are often similar to eachother in a certain aspect. For example, a donkey may look like a horsefrom a distance.

FIG. 7 outlines a method for training a multi-class boosting tree inaccordance with the present invention. The method first finds theoptimal feature that divides the multi-class patterns into 2 classes andthen uses the previous two class boosting tree method to learn theclassifiers. In many instances, the first feature selected by theboosting method after transforming the multi-class into two classes isoften the one chosen for splitting the multi-classes. Intuitively, therest of the features/weak classifiers picked are supporting the firstone to make a stronger decision. Thus, the two class classificationproblem is a special case of the multi-class classification problem.Similar objects of different classes, according to the features, aregrouped against the others. As the tree expansion continues, they aregradually clustered and set apart. The expansion stops when each classhas been successfully separated or there are too few training samples.

The testing procedure for a multi-class PBT is nearly the same as thatin the two class problem. Again, the top node of the tree integrates allthe probabilities from its sub-trees, and outputs an overall posteriorprobability. The scale of the problem is O(log(n)) with respect to thenumber of classes, n. This multi-class PBT is very efficient incomputing the probability due to the hierarchical structure. This isimportant when we want to recognize hundreds or even thousands ofclasses of objects, which is the problem human vision systems aredealing with every day. In the worst case, each tree node may betraversed. However, this is rarely the case in practice.

The multi-class probabilistic boosting tree can be used for objectcategorization in accordance with the present invention. An example willnow be described. FIG. 8 illustrates histograms of four object images onintensity and three Gabor filtering results. The images shown representfour categories: bonsai 802, 804, cougar body 806, 808, dollar bill 810,812, and ketch 814, 816. Histograms are shown to be robust againsttranslation and rotation and have good discriminative power. Histogramson different filter responses of an image serve as different cues, whichcan be used and combined to perform scene analysis and objectrecognition. To learn discriminative models, up to the 3^(rd) ordermoments are computed for each histogram h(s) to make use of integralimage for fast computing.

The goal is to learn a discriminative model so that it outputs theposterior distribution on the category label for each input image patch.Each object image is resized into an 80×80 patch. For each image patch,the edge maps are computed using a Canny edge detector at three scales,orientation of the edges and filtering results by 10 Gabor filters.These are the cue images for the image patch. One thousand rectangleswith different aspect ratios and sizes centered at various locations areput in the 80×80 image patch. Features are the moments of the histogramfor each rectangle on every cue image. The multi-class PBT then picksand combines these features forming hierarchical classifiers.

For purposes of explanation, 29 out of a total of 80 categories arepicked in the image dataset. There are 41 images for each category andsome of them are shown in FIG. 9. The images are taken for objects atdifferent view directions and illuminations. Next, 25 images arerandomly picked out of each category for training. FIG. 9 shows thesample images 902 and clusters 904 formed in the boosting tree afterlearning. The approach of the present invention is capable ofautomatically discovering the intra-class similarity and inter-classsimilarity and dis-similarity. For the images not picked in training,the recognition/categorization rate is tested. The one with the highestprobability is considered as correct recognition. Table 1 below showsthe recognition rate on the remaining 16 images for each category. Theaverage recognition rate is 76%. TABLE 1 apple1 100%  cup1 100%  tomato3100% horse1 94% pear10 94% apple4 94% pear3  94% pear9 94% cup4 88% cow188% pear8  88% dog2 81% car1 81% pear1 81% apple3  75% car9 75% tomato175% tomato10 75% horse3  75% cup9 75% dog10 69% dog1 69% horse8  69%car11 56% car11 56% cow2 50% cow10  44% horse10 44% cow8 0.19

In the next example, a more complicated image set is used which is knownas the Caltech—101 image categories. Some of the typical images areshown in FIG. 1. Instead of working on the original images, all of theimages were cropped and resized to 80×80. Learning and testing areperformed based on the cropped image. Next, 25 images are randomlyselected from each category for training. FIG. 10 shows some of theclusters formed after training. However, the clusters are sparser thanthe ones in the other image set described above due to complex objectcategories. For each category y_(j), the histogram is computed$\begin{matrix}{{h(N)} = {\sum\limits_{i}{\delta\left( {N - {N\left( x_{i} \right)}} \right)}}} & (6)\end{matrix}$

where N is a leaf node and N(x_(i)) is the leaf node at which trainingsample x_(i) is finally located. The entropy of h(N) tells how tight thesamples of each category are in the tree. For objects which are similarto each other in the category, tight clusters should be formed. Objectshaving large variation are more scattered among the tree. In Table 2shown below, the third column after the category name gives the entropymeasure for each category. TABLE 2 object r1 r2 entropy object r1 r2 epyinline skate 100%  100%  0.39 garfield 100%  100%  1.26 yin yang 83% 83%0.52 stop sign 66% 81% 1.09 revolver 63% 80% 1.16 metronome 63% 75% 1.26dollar bill 61% 78% 0.80 motorbikes 56% 75% 0.52 . . . . . . . . joshuatree  0% 19% 1.19 beaver  0% 25% 1.36 chair  0%  9% 1.75 wild cat  0%22% 1.56 crab  0%  0% 1.92 background 2.82

Object categories like “ying yang” have very low entropy and, notsurprisingly, the background category has the most variability and thehighest entropy. This entropy measure loosely indicates how difficult itwill be to recognize each category. The categorization/recognitionresult is shown in Table 2. The first column after the category name,r1, is the recognition rate when the discriminative model outputs itscategory id as the one with the highest probability. The averagerecognition rate for r1 is 20%. A random guess would get a rate around1%, The second column, r2, is the categorization rate when the categoryid is among the top ten choices. The average rate for r2 is 40%.

The present invention can be used for both object detection andclassification. Some examples of applications for which PBT would bebeneficial include multi-view face detection, localization of the leftventricle and fetal measurements (e.g., fetal head, abdomen, and femur).The present invention can then further be used to classify the detectedobjects. For example, in the case of the left ventricle, the leftventricle can be classified as long vs. round. In the case of fetalmeasurements, the head measurements can be separated from the abdomenmeasurements.

The PBT is trained on a training set containing approximately 2,000aligned positive samples, and 90,000 negative samples, all of the samesize. The negatives contain also shifted copies of the positive samples,for better localization. FIG. 11 illustrates a series of still images1102 that represent an input video of a heart and the resulting images1104 in which the left ventricle is detected in accordance with thepresent invention. For detection, the left ventricle is searched in theinput image 1102 at different locations, rotation angles, scales andaspect ratios using a coarse-to fine strategy. Each search location,rotation, scale and aspect ratio corresponds to a not necessarilyhorizontal bounding box in which the left ventricle should reside.Examples of bounding boxes include 1106-1110. The trained PBT will givea probability for each such box, and the box with the highestprobability is selected as the location, size and orientation of theleft ventricle. An example of a localization of a left ventricle 1202 isshown in FIG. 12. The “+s” indicate the location of the endocardial wall1204. The bounding box 1206 represents the location of the leftventricle.

The same strategy is used for the localization of the fetal head,abdomen and femur from ultrasound data. FIGS. 13-15 show examples oflocalization of the fetal head 1302, fetal abdomen 1402 and femur 1502in accordance with the present invention. As with the left ventricle,the fetal head, abdomen or femur is searched in the input image atdifferent locations, rotation angles, scales and aspect ratios using acoarse-to-fine strategy. Each search location, rotation, scale andaspect ratio corresponds to a bounding box in which the head, abdomen orfemur should reside. The trained PBT will give a probability for eachsuch box, and the box with the highest priority is selected as thelocation, size and orientation of the head, abdomen or femur.

The PBT can also be used for Rectal Tube detection from CT volumetricdata. An example will now be described with reference to FIGS. 16 and17. A set of 7000 features invariant to axial rotation, based ongradient and curvature, was used for training. The training setcontained approx 20000 tube segments as positive examples and 250000negative examples.

Typically the search for the tubes in the 3D data would involve a searchfor many locations, 3D directions and sizes of the tube, which iscomputationally prohibitive. Instead, a tensor voting strategy is usedto propose 3D locations, directions and radii of the tube candidates.The trained PBT classifier is computed for each of the tube candidates,and those whose probability is larger than a threshold are selected asdetected tubes as shown by tubes 1602 and 1702 in FIGS. 16 and 17.

As indicated above, the present invention can also be used formulti-view face detection. The trained PBT provides different examplesof facial views. FIG. 18 shows some examples of facial detection resultstested on a frontal and profile image set.

Having described embodiments for a method for computing multi-classdiscriminative models using a probabilistic boosting tree framework, itis noted that modifications and variations can be made by personsskilled in the art in light of the above teachings. It is therefore tobe understood that changes may be made in the particular embodiments ofthe invention disclosed which are within the scope and spirit of theinvention as defined by the appended claims. Having thus described theinvention with the details and particularity required by the patentlaws, what is claimed and desired protected by Letters Patent is setforth in the appended claims.

1. A method for localization of an object in an image comprising thesteps of: a). constructing a probabilistic boosting tree in which eachnode combines a number of weak classifiers into a strong classifier orconditional posterior probability; b). receiving at least one inputimage containing the object to be localized; c). identifying a boundingbox in the input image in which the object should reside based on theconditional posterior probability; d). computing a probability value forthe bounding box based on the likelihood that the object in fact residesin that location; e). repeating steps c).-d). at different locations inthe input image; and f). selecting the bounding box with the highestcomputed probability as the location where the object resides.
 2. Themethod of claim 1 wherein step e). further comprises the steps of:searching the at least one input image at different rotations in theimage; and searching the at least one input image at different aspectratios in the image.
 3. The method of claim 1 wherein the weakclassifiers represent features of the object.
 4. The method of claim 1wherein the object is an anatomical structure.
 5. The method of claim 4wherein the anatomical structure is a left ventricle.
 6. The method ofclaim 4 wherein the anatomical structure is a fetus head.
 7. The methodof claim 4 wherein the anatomical structure is a fetus abdomen.
 8. Themethod of claim 4 wherein the anatomical structure is a fetus femur. 9.The method of claim 4 wherein the anatomical structure is a face. 10.The method of claim 4 wherein the anatomical structure is a rectal tube.11. A method for detecting an object in an image comprising the stepsof: a). constructing a probabilistic boosting tree in which each nodecombines a number of weak classifiers into a strong classifier orconditional posterior probability; b). receiving at least one inputimage; c). identifying a bounding box in the at least one input image inwhich the object may reside based on the conditional posteriorprobability; d). computing a probability value for the bounding boxbased on the likelihood that the object resides in the image; e).comparing the probability to a predetermined threshold; f). maintainingthe bounding box if the probability is above the predetermined thresholdg). repeating steps c).-f). at different locations in the image; and h).determining that the object resides in the image if the probability forat least one bounding box is above the predetermined threshold.
 12. Themethod of claim 11 wherein step g).further comprises the steps of:searching the at least one input image at different rotations in theimage; and searching the at least one input image at different aspectratios in the image.
 13. The method of claim 12 wherein the searching isperformed in a coarse to fine manner.
 14. The method of claim 11 whereinthe weak classifiers represent features of the object.
 15. The method ofclaim 11 wherein the object is an anatomical structure.
 16. The methodof claim 15 wherein the anatomical structure is a left ventricle. 17.The method of claim 15 wherein the anatomical structure is a fetus head.18. The method of claim 15 wherein the anatomical structure is a fetusabdomen.
 19. The method of claim 15 wherein the anatomical structure isa fetus femur.
 20. The method of claim 15 wherein the anatomicalstructure is a face.
 21. The method of claim 15 wherein the anatomicalstructure is a rectal tube.
 22. A method of classifying images ofobjects into different image categories comprising the steps of:recursively constructing a probabilistic boosting tree in which eachtree node is a strong classifier, a discriminative model being obtainedat the top of the tree and each level of the tree comprising anaugmented variable; dividing an input training set into two new setsaccording to a learned classifier; using the two new sets to train aleft and right sub-trees recursively, wherein clustering isautomatically formed in a hierarchical way; and outputting anappropriate number of classifications based on a number of clustersformed.
 23. The method of claim 22 wherein the probabilistic tree solvesa two class problem.
 24. The method of claim 22 wherein the step ofoutputting an appropriate number of classifications comprises a positiveclass and a negative class.
 25. The method of claim 22 wherein theprobabilistic tree solves a multi-class problem.
 26. The method of claim25 wherein the step of outputting an appropriate number ofclassifications comprises multiple categories.
 27. The method of claim22 wherein the object is an anatomical structure.
 28. The method ofclaim 27 wherein the anatomical structure is a left ventricle.
 29. Themethod of claim 27 wherein the anatomical structure is a fetus head. 30.The method of claim 27 wherein the anatomical structure is a fetusabdomen.
 31. The method of claim 27 wherein the anatomical structure isa fetus femur.
 32. The method of claim 27 wherein the anatomicalstructure is a face.
 33. The method of claim 27 wherein the anatomicalstructure is a rectal tube.